Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
This paper provides a finite-sample analysis of a passive stochastic gradient Langevin dynamics (PSGLD) algo- rithm. This algorithm is designed to achieve adaptive inverse reinforcement learning (IRL). Adaptive IRL aims to estimate the cost function of a forward learner performing a stochastic gradient algorithm (e.g., policy gradient reinforcement learning) by observing their estimates in real-time. The PSGLD algorithm is considered passive because it incorporates noisy gradients provided by an external stochastic gradient algorithm (forward learner), of which it has no control. The PSGLD algorithm acts as a randomized sampler to achieve adaptive IRL by reconstructing the forward learner’s cost function nonparametrically from the stationary measure of a Langevin diffusion. This paper analyzes the non-asymptotic (finite-sample) performance; we provide explicit bounds on the 2-Wasserstein distance between PSGLD algorithm sample measure and the stationary measure encoding the cost function, and provide guarantees for a kernel density estimation scheme which reconstructs the cost function from empirical samples. Our analysis uses tools from the study of Markov diffusion operators. The derived bounds have both practical and theoretical significance. They provide finite-time guarantees for an adaptive IRL mechanism, and substantially generalize the analytical framework of a line of research in passive stochastic gradient algorithms.more » « lessFree, publicly-accessible full text available June 1, 2026
-
This paper discusses the theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application to decision-making involving interacting LLMAs remains unexplored. Motivated by Bayesian sentiment analysis on online platforms, we construct interpretable models and stochastic control algorithms that enable LLMAs to interact and perform Bayesian inference. Because interacting LLMAs learn from both prior decisions and external inputs, they can exhibit bias and herding behavior. Thus, developing interpretable models and stochastic control algorithms is essential to understand and mitigate these behaviors. This paper has three main results. First, we show using Bayesian revealed preferences from microeconomics that an individual LLMA satisfies the necessary and sufficient conditions for rationally inattentive (bounded rationality) Bayesian utility maximization and, given an observation, the LLMA chooses an action that maximizes a regularized utility. Second, we utilize Bayesian social learning to construct interpretable models for LLMAs that interact sequentially with each other and the environment while performing Bayesian inference. Our proposed models capture the herding behavior exhibited by interacting LLMAs. Third, we propose a stochastic control framework to delay herding and improve state estimation accuracy under two settings: 1) centrally controlled LLMAs and 2) autonomous LLMAs with incentives. Throughout the paper, we numerically demonstrate the effectiveness of our methods on real datasets for hate speech classification and product quality assessment, using open-source models like LLaMA and Mistral and closed-source models like ChatGPT. The main takeaway of this paper, based on substantial empirical analysis and mathematical formalism, is that LLMAs act as rationally bounded Bayesian agents that exhibit social learning when interacting. Traditionally, such models are used in economics to study interacting human decision-makers.more » « lessFree, publicly-accessible full text available January 1, 2026
-
How to design a Markov decision process (MDP)-based radar controller that makes small sacrifices in performance to mask its sensing plan from an adversary? The radar controller purposefully minimizes the Fisher information of its emissions so that an adversary cannot identify the controller’s model parameters accurately. Unlike classical open-loop statistical inference, where the Fisher information serves as a lower bound for the achievable covariance, this article employs the Fisher information as a design constraint for a closed-loop radar controller to mask its sensing plan. We analytically derive a closed-form expression for the determinant of the Fisher informa- tion matrix (FIM) pertaining to the parameters of the MDP-based controller. Subsequently, we formulate an MDP, which is regularized by the determinant of the adversary’s FIM. This results in the per- turbations to the total cost of operation, state–action costs, and the transition matrix. In addition, we propose a maximum-entropy-based convex lower bound for the FIM-constrained MDP, whose solution can serve as an initialization point for the proposed nonlinear optimization problem. This convex lower bound aligns with existing work that employs the same maximum entropy criteria to mask the sensing plan. Numerical results show that the introduction of minor perturbations to the MDP’s state–action costs, transition kernel, and the total operation cost can reduce the Fisher information of the emissions. Consequently, this reduction amplifies the variability in policy and transition kernel estimation errors, thwarting the adversary’s accuracy in estimating the controller’s sensing plan. We demonstrate this by comparing the error in the estimate of the transition kernel under different criteria: the FIM criteria, the maximum entropy criteria, a sensing plan where actions are chosen uniformly, and the unmasked sensing plan.more » « lessFree, publicly-accessible full text available February 1, 2026
-
Estimating exposure to information on a social network is a problem with important consequences for our society. The exposure estimation problem involves finding the fraction of people on the network who have been exposed to a piece of information (e.g., a URL of a news article on Facebook, a hashtag on Twitter). The exact value of exposure to a piece of information is determined by two features: the structure of the underlying social network and the set of people who shared the piece of information. Often, both features are not publicly available (i.e., access to the two features is limited only to the internal administrators of the platform) and are difficult to estimate from data. As a solution, we propose two methods to estimate the exposure to a piece of information in an unbiased manner: a vanilla method that is based on sampling the network uniformly and a method that non-uniformly samples the network motivated by the Friendship Paradox. We provide theoretical results that characterize the conditions (in terms of properties of the network and the piece of information) under which one method outperforms the other. Further, we outline extensions of the proposed methods to dynamic information cascades (where the exposure needs to be tracked in real time). We demonstrate the practical feasibility of the proposed methods via experiments on multiple synthetic and real-world datasets.more » « lessFree, publicly-accessible full text available December 31, 2025
-
This article introduces a novel mutual informationbased measure to assess the glass ceiling effect in preferential attachment networks, which advances the analysis of inequalities in attributed networks. Using Shannon entropy and generalizing to Rényi entropy, our measure evaluates the conditional probability distributions of node attributes given the node degrees of adjacent nodes, which offers a more nuanced understanding of inequality compared to traditional methods that emphasize node degree distributions and degree assortativity alone. To evaluate the efficacy of the proposed measure, we evaluate it using an analytical structural inequality model as well as historical publication data. Results show that our mutual information measure aligns well with both the theoretical model and empirical data, underscoring its reliability as a robust approach for capturing inequalities in attributed networks. Moreover, we introduce a novel stochastic optimization algorithm that utilizes a parameterized conditional logit model for edge addition. Our algorithm is shown to outperform the baseline uniform distribution based approach in mitigating the glass ceiling effect. By strategically recommending links based on this algorithm, we can effectively hinder the glass ceiling effect within networks.more » « lessFree, publicly-accessible full text available December 1, 2025
-
How can non-communicating agents learn to share congested resources efficiently? This is a challeng- ing task when the agents can access the same resource simultaneously (in contrast to multi-agent multi-armed bandit problems) and the resource valuations differ among agents. We present a fully distributed algorithm for learning to share in congested environments and prove that the agents’ regret with respect to the optimal allocation is poly-logarithmic in the time horizon. Performance in the non-asymptotic regime is illustrated in numerical simulations. The distributed algorithm has applications in cloud computing and spectrum sharing.more » « lessFree, publicly-accessible full text available November 1, 2025
-
A learner aims to minimize a function f by repeatedly querying a distributed oracle that provides noisy gradient evaluations. At the same time, the learner seeks to hide arg min f from a malicious eavesdropper that observes the learner’s queries. This paper considers the problem of covert or learner-private optimization, where the learner has to dynamically choose between learning and obfuscation by exploiting the stochasticity. The problem of controlling the stochastic gradient algorithm for covert optimization is modeled as a Markov decision process, and we show that the dynamic programming operator has a supermodular structure implying that the optimal policy has a monotone threshold structure. A computationally efficient policy gradient algorithm is proposed to search for the optimal querying policy without knowledge of the transition probabilities. As a practical application, our methods are demonstrated on a hate speech classification task in a federated setting where an eavesdropper can use the optimal weights to generate toxic content, which is more easily misclassified. Numerical results show that when the learner uses the optimal policy, an eavesdropper can only achieve a validation accuracy of 52% with no information and 69% when it has a public dataset with 10% positive samples compared to 83% when the learner employs a greedy policy.more » « less
-
This letter studies how a stochastic gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper. Such prob- lems are of significant interest in distributed optimization settings like federated learning and inventory management. A learner queries a stochastic oracle and incentivizes the oracle to obtain noisy gradient measurements and per- form SG. The oracle probabilistically returns either a noisy gradient of the function or a non-informative measure- ment, depending on the oracle state and incentive. The learner’s query and incentive are visible to an eavesdropper who wishes to estimate the stationary point. This letter formulates the problem of the learner performing covert optimization by dynamically incentivizing the stochastic oracle and obfuscating the eavesdropper as a finite-horizon Markov decision process (MDP). Using conditions for interval-dominance on the cost and transition probability structure, we show that the optimal policy for the MDP has a monotone threshold structure. We propose searching for the optimal stationary policy with the threshold structure using a stochastic approximation algorithm and a multi– armed bandit approach. The effectiveness of our methods is numerically demonstrated on a covert federated learning hate-speech classification task.more » « less
-
Can deep convolutional neural networks (CNNs) for image classification be interpreted as utility maximizers with information costs? By performing set-valued system identifica- tion for Bayesian decision systems, we demonstrate that deep CNNs behave equivalently (in terms of necessary and sufficient conditions) to rationally inattentive Bayesian utility maximizers, a generative model used extensively in economics for human decision-making. Our claim is based on approximately 500 numerical experiments on 5 widely used neural network archi- tectures. The parameters of the resulting interpretable model are computed efficiently via convex feasibility algorithms. As a practical application, we also illustrate how the reconstructed interpretable model can predict the classification performance of deep CNNs with high accuracy. The theoretical foundation of our approach lies in Bayesian revealed preference studied in micro-economics. All our results are on GitHub and completely reproducible.more » « less
-
A metacognitive radar switches between two modes of cognition— one mode to achieve a high-quality estimate of targets, and the other mode to hide its utility function (plan). To achieve high-quality es- timates of targets, a cognitive radar performs a constrained utility maximization to adapt its sensing mode in response to a changing target environment. If an adversary can estimate the utility function of a cognitive radar, it can determine the radar’s sensing strategy and mitigate the radar performance via electronic countermeasures (ECM). This article discusses a metacognitive radar that switches between two modes of cognition: achieving satisfactory estimates of a target while hiding its strategy from an adversary that detects cognition. The radar does so by transmitting purposefully designed suboptimal responses to spoof the adversary’s Neyman–Pearson de- tector. We provide theoretical guarantees by ensuring that the Type-I error probability of the adversary’s detector exceeds a predefined level for a specified tolerance on the radar’s performance loss. We illustrate our cognition-masking scheme via numerical examples in- volving waveform adaptation and beam allocation. We show that small purposeful deviations from the optimal emission confuse the adversary by significant amounts, thereby masking the radar’s cognition. Our approach uses ideas from revealed preference in microeconomics and adversarial inverse reinforcement learning. Our proposed algorithms provide a principled approach for system-level electronic counter- countermeasures to hide the radar’s strategy from an adversary. We also provide performance bounds for our cognition-masking scheme when the adversary has misspecified measurements of the radar’s response.more » « less
An official website of the United States government

Full Text Available